Source Retrieval Based on Learning to Rank and Text Alignment Based on Plagiarism Type Recognition for Plagiarism Detection

نویسندگان

  • Leilei Kong
  • Yong Han
  • Zhongyuan Han
  • Haihao Yu
  • Qibo Wang
  • Tinglei Zhang
  • Haoliang Qi
چکیده

This paper regards the query keywords selection problem in source retrieval as learning a ranking model to choose the method of keywords extraction over suspicious document segments. Four basic methods are used in our ranking function: BM25, TFIDF, TF and EW. Then, a ranking model based on Ranking SVM is proposed to rank the query keywords group which is contributed to get the higher evaluation measure F. In our ranking model, achieving the best performance measure F of source retrieval is used as the target of learning to rank. In text alignment, a novel method based on the plagiarism type recognition model is proposed. This approach employs the distinct strategies to detect the plagiarism text according the different plagiarism type. The plagiarism type recognition model is based on logical regression model. The experimental results on PAN 2014 plagiarism detection corpus indicate the efficiency of the proposed methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013

In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment...

متن کامل

A Text Alignment Algorithm Based on Prediction of Obfuscation Types Using SVM Neural Network

In this paper, we describe our text alignment algorithm that achieved the first rank in Persian Plagdet 2016 competition. The Persian Plagdet corpus includes several obfuscation strategies. Information about the type of obfuscation helps plagiarism detection systems to use their most suitable algorithm for each type. For this purpose, we use SVM neural network for classification of documents ac...

متن کامل

External Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages

With the advent of the internet and easy access to digital libraries, plagiarism has become a major issue. Applying search engines is one of the plagiarism detection techniques that converts plagiarism patterns to search queries. Generating suitable queries is the heart of this technique and existing methods suffer from lack of producing accurate queries, Precision and Speed of retrieved result...

متن کامل

Overview of the 6th International Competition on Plagiarism Detection

This paper overviews 17 plagiarism detectors that have been evaluated within the sixth international competition on plagiarism detection at PAN 2014. We report on their performances for the two tasks source retrieval and text alignment of external plagiarism detection. For the third year in a row, we invite software submissions instead of run submissions for this task, which allows for cross-ye...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014